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Abstract 

Finding and sampling multiple reaction channels for molecular transitions remains an important 
challenge in physical chemistry Here we show that the weighted ensemble (WE) path sampling 
method can readily sample multiple channels. In a first test, both the WE and transition path 
sampling methods are applied to two-dimensional model potentials. The comparison explains 
why the weighted ensemble approach will not be trapped in one channel. The WE approach 
is then used to sample the full transition path ensemble in implicitly solvated alanine dipeptide 
at two different temperatures. The ensembles are of sufficient quality to permit quantification 
of the fractional importance of each channel, even at T = 300K when brute-force simulation is 
prohibitively expensive. 
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I. INTRODUCTION 



Path sampling is an important strategy for studying study conformational transitions 
and activated processes. Some path sampling methods were applied to large biological 
system recently, for example, to study the conformational transitions of proteins. Confor- 
mational transition is important for protein to carry out their functions. But the pathway 
of these transitions are not easy to obtain either by experiments or simulations. Experi- 
ments, such as resolution studies by X-ray or NMR, are capable to determine the atomic 
structure of the stable states of protein, but not the transition states in the middle of the 
pathway. Conventional MD simulations are not good to study activated process because 
it involves a waiting step, which has been studied for a long time. The long waiting time 
to generate an ensemble of activated processes by straightforward brute-force approach is 
usually beyond the reach of today's computational power. In fact, if the activated process 
does take place, it is usually very quickly. Path sampling approaches take advantage of 
this property by focusing computer resources exclusively on rare transition events. 

A number of path generating and path sampling approaches have been developed. 
Some ad hoc path generating methods have been applied for large biological systems, like 
the targeted molecular dynamics (TMD) l^^^ and steered molecular dynamics (SMD). 

Another type of path sampling approaches focus on the optimal paths or the paths 
close to the optimal paths, such as the milestone method and the string method. Some 
path sampling methods take root in the idea of path integral methods, they sample 
the paths of rare events by evaluating their relative weights, such as, the transition 
path sampling (TPS),-^*2i2 the dynamic importance sampling (DIMS),-^^'^^ the transition 
interface sampling approach (TIS) ^^'^^ and the random walk in path space by following 
a pseudo Langevin equation. In principle those methods based on the calculation of 
the weights of paths can generate properly distributed path ensembles. Not long ago 
Zuckerman, Jasnow and Zhang showed the Weighted Ensemble (WE) method is 
a very promising simulation approach to investigate conformational transitions, which 
can lead to correct path ensembles and reaction rate simultaneously. WE is also an exact 
algorithm for many dynamics types. — 

Despite a number of successes, path sampling research still faces many challenges. One 
key difficulty is finding multiple reaction channels for a transition. — For example, the 
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transition path sampling approach, perhaps the best known method, can become trapped 
in a local minimum, and miss other channels. The transition interface sampling method 
has the same drawback as its ancestor, TPS. Recently Bolhuis and van Erp tried to solve 
this problem by using the combination of the replica exchange and the transition interface 
sampling methods (RETIS). 

The aim of this paper is to show the weighted ensemble method has no difficulty to 
sample multiple transition channels. The weighted ensemble method, which is based 
on unbiased replication and combination of brute-force simulations, does not have this 
drawback. 

This paper is set up as follows. In SecUD we introduce the models and review the meth- 
ods briefly. In SecHUl first the weighted ensemble and transition path sampling methods 
are applied to two-dimensional model potentials. This simple example highlights the 
differences between the two methods sampling of multiple channels. Then the weighted 
ensemble approach is tested to find the transition events between two stable structures 
of alanine dipeptide at high temperature 500K. The path ensembles are compared with 
those gained from brute-force simulations. At last the weighted ensemble method sam- 
ples the full transition ensemble of alanine dipeptide at room temperature 300K, which 
is prohibitively expensive for brute-force simulation. This paper ends with conclusions 
and discussion. 

XL MODELS AND METHODS 
A. Two-Dimensional Models 

We will show the difference of how the WE approach and TPS approach sample 
multiple channels by using two-dimensional model potentials. The toy potentials Ui and 
U2 are inspired by Chen, Nash and Horing's work, and defined via 

U{x,y)/kBT = a(x^ + y^-l)V 

- exp{-4[(x - If + y']} - exp{-4[(x + if + y']} 
+ exp[8(x - 1.5)] + 4 exp[-8(x + 1.5)] 

+ exp[-4(y + 0.25)] + 16 exp{-2x^) , (1) 
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with a - 20 and a = 72 respectively distinguishing Ui and U2. The contours of these 
two potentials are shown in Fig. [H They both have two channels connecting the left and 
right wells, which are shown by the red arrows. The only significant difference between 
these two potentials is the height of barrier separating two channels. Compared with 
the saddle points (~ 16/cbT and IVUbT in Fig. [1]), when a - 20, the barrier height is about 
SksT. And when a - 72, the barrier height is about lOksT. The WE approach (using the 
horizontal position x as the progress coordinate) and TPS approach are applied to study 
the transition events of an over-damped Brownian particle from the left well to the right 
well. The initial state and the final state are defined as the regions where the potential 
satisfies U{x, y) < {Umin + 2), where Umin is the lowest potential in the left and right wells. 

B. Alanine Dipeptide 

The second system on which we will test the weighted ensemble approach is alanine 
dipeptide (ace-ala-nme). The molecule is shown in Fig. [21 The principle variables describ- 
ing the structure of alanine dipeptide are two backbone dihedral angles: O (C-N-C-C) 
and W (N-C-C-N). Alanine dipeptide is frequently used for testing simulation methods 
and force fields. The reasons of choosing this molecule are as following. First it is one 
of the simplest molecules which contains two full peptide planes, so it will contains many 
structural features of protein backbones. — Second alanine dipeptide is small enough that 
its free energy surface can be studied thoroughly by different approaches. 2^i2it2^i2^t2Zi2^t22i2S 
Third the conformational transitions of alanine dipeptide contains multiple channels and 
have been studied by several groups recently, i^i^ 

Several brute force simulations are performed first, using Langevin dynamics in the 
CHARMM program to find the energy minimum states. The simulations use the 
"united atom model'' with the CHARMM parameter set 19 and implicit solvent ACE 
(analytical continuum electrostatics) model. The dihedral angles of the four energy 
minimum states we find are shown in Table. Il Our locations of minima are somewhat 
different with the previous study, i^^^^^ but it is known that the simulation of alanine 
dipeptide is very sensitive to solvent model, The WE approach is applied to study 
the transition events between state Cyeq and Cyax- The initial state Cyeq is defined as the 
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area closed by the circle 

[W - (-77.9)]^ + [O - (138.4)]^ = (40)^ , (2) 
and the final state Cyax is the area closed the circle 

[W - {61A)f + [O - (-71.4)]^ = (20)^ , (3) 

as shown in Fig. [3l 

We choose the dihedral angles W and O as the progress coordinates for the WE ap- 
proach. The two-dimensional space of dihedral angles is cut into a 12 x 12 grid, with 
20 simulations allowed in each grid. After every t = lOOfs, the embedded CHARMM 
simulations are paused, and the simulations are combined and split without bias. The 
weighted ensemble program was stopped after 2500t. 

III. RESULTS 

A. Two-Dimensional Models 

The TPS approach and the WE approach sample the paths in multiple channels in 
different ways. To show the difference, we use the y position of path crossing the x - 
section last time to identify which channel it belongs to, and plot this position for each path 
versus the path index (simulation time). Fig. |4](a) and Fig. |4](b) show the path switching 
of TPS method for potentials Ui and U2. The TPS method, as a Monte Carlo simulation, 
samples the channels (local minimum in the path space) one by one. For potential LZi, 
the switch happens about every 10^ paths. For potential U2; because of the higher barrier, 
the switch happens approximately every 5-10^ paths, which is much less frequent. This 
reveals how, for higher barriers, the sampling can get "trapped'' and could be biased. 
Fig. IHc) and Fig. |4](d) show the path switching of WE method for these two potentials. 
The frequent switches suggest that the Weighted Ensemble method, after a short transient 
period, generates different types of paths simultaneously in both potentials. 

Both methods are used to get the distribution of transition-event durations. The 
definition of the transition-event duration is the time interval between the last time the 
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Brownian particle leave the initial state and the first time it reaches the final state. It is the 
time the Brownian particle uses to finish the transition. The WE simulations are stopped 
after they generate 10^ paths. The TPS simulations are stopped after 2.5 • 10^ paths are 
obtained. The results are shown in Fig. [5l For small barrier (LZi), both methods get the 
correct distribution by using 10^ paths. But for potential LZ2, because of the high barrier 
separating the two types of paths, a "short'' transition path sampling simulation yields 
an incorrect distribution after 10^ paths have been sampled. In that case, the transition 
path sampling is trapped by the local minimum in path space; see Fig. |4](b). 

B. Alanine Dipeptide 

The WE method is applied to study the transition events between state Cyeq and Cyax 
of alanine dipeptide under temperature 300K and 500K. The brute-force simulations are 
also used to get the transition paths under 500k, and the results are compared with those 
given by WE method. 

1. The Transition Rate 

The reaction rate k is an important quantity for chemical and biological reactions 
and transitions. If the first passage time is too long, this quantity will be impossible 
to obtain by brute-force simulations. The WE method can yield the path ensemble and 
reaction rate simultaneously. Under 500K brute-force simulations obtained the reaction 
rate /cbf = 1.5 x 10~^ /ns. And WE method obtained the reaction rate /cwe = 1.4 x 10~^ /ns. 
They are in good agreement. When temperature is 300K, the reaction rate obtained by 
WE method is /cwe = 1-6 x 10"^/ns, which means compared with 500K, the transition is 
about 100 times more difficult to happen now. 

2. Paths in Different Channels 

The path ensemble will be studied in an enlarged dihedral plane (Fig. ^ which clearly 
shows alternative and continuous paths. The transition paths can be roughly divided into 
four types according to which of the (physically equivalent) Cyax states they end in this 
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extended dihedral plane. These four types of paths correspond to the combinations of 
clockwise and anticlockwise rotational directions of dihedral angles W and O. Different 
types of paths pass different barriers; therefore they belong to different channels. Under 
300K, approximately 100 paths randomly chosen (based on their weights) are plotted in 

Fig. [a 

The distributions of these four types of paths are listed in Table [III Once again under 
500K, the results given by WE method and brute-force are in good agreements. Because 
the enlarged plane is infinite, there are also infinite final Cyax states in it. Some paths ended 
out of these four closest final states, the distributions of them are listed in the "others'' 
column in the table. The path ensemble connecting to the 'Tower right" Cyax state is the 
most important type. And when the temperature decreased, it became more dominant. 
From Fig. [H we can tell there are two channels in this type of paths. They are divided 
apart by the high barrier around ^ = and = 0. 

IV. CONCLUSIONS 

Our results show that the weighted ensemble (WE) path sampling approach is naturally 
suited to the fundamental problem of sampling multiple reaction channels in molecular 
systems. The ability to sample multiple channels is more than an abstract statistical 
mechanics issue: after all, it would not seem possible to find an "optimal" path without 
the ability to fully traverse path space. 

Our studies employed toy systems to illustrate the basic mechanims underlying WE 
and the more familiar transition path sampling (TPS) methods, and then focused on a 
molecular system in detail. In the toy systems, the differences between WE and TPS were 
clear: because TPS is a Monte Carlo simulation, it can be trapped in one path channel. 
By contrast, the WE method is based on the unbiased replication and combination and 
of brute-force simulations, and cannot be trapped. The WE approach was also employed 
to find transition events between the Cyeq and Cyax states of atomistic alanine dipeptide at 
temperature 500K. The results were checked and supported by brute-force simulations. 
Finally, the WE method was employed to study the same transition at room temperature 
300K. A high-quality transition path ensemble including multiple, clearly separated 
channels was obtained. 
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We do not know of any previous report of a statistically rigorous path ensemble of 
alanine dipeptide. Put another way, this appears to be the first report of the fractional 
importance of the various channels to the path ensemble, despite the small size of the 
molecule. 

In the long term, we belive WE path sampling has key strengths, and also some weak- 
nesses. WE is easy to implement and suitable for use with almost type of stochastic 
dynamics — including MD with a stochastic thermostat. — The multiple-trajectory "ar- 
chitecture'' of WE makes it straightforward to parallelize. WE does not require precise 
knowledge of a reaction coordinate — indeed, no "targeting'' was used in the alanine 
dipeptide simulations reported here. Further, WE sampling simultaneously provides the 
reaction rate and path ensemble. Like TPS, however, WE produces correlated trajecto- 
ries in the path ensembles generated — although this does not prevent the method from 
achieving efficiency. Ultimately, the great flexibility of the approach, especially with 
regard to binning strategies suggests we are only at the beginning of appreciating the 
possibilities of Huber and Kim's seminal strategy. — 
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Figure 1: Contours of two-dimensional toy potentials. The left panel is the potential Ui with 
a = 20, the right panel is the potential U2 with a = 72. The red arrows show the different channels 
connecting the left and right wells. The numbers among the contours indicate energy values in 
ksT units for extrema and saddles. 
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(55.1,46.4) 


(-77.9,138.4) 


(-75.6,-39.9) 


(61.4,-71.4) 



Table I: Four energy minima of alanine dipeptide. 
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Figure 3: Four stable states of alanine dipeptide. The dihedral angles W and O are chosen as 
the progress coordinates for the WE approach. The two-dimensional space of dihedral angles is 
divided into a 12 x 12 grid for WE simulation, with 20 simulations allowed in each grid. 
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Figure 4: Channel-switching in TPS and WE simulations of toy models. Both methods were 
applied to the two toy potentials, Ui and U2 of Fig. [H The various panels show (a)TPS for LTi, 
(b)TPS for U2, (c)WE for Ui and (d)WE for 112- The TPS method samples the channels one by 
one. In panel (a), for potential LTi, the switch happens about every 10^ paths. In panel (b), for 
potential U2, the switch happens approximately every 5 • 10^ paths. But in panel (c) and (d), after 
a short transient period, WE simulations generate different types of paths simultaneously. Note 
the greatly enlarged horizontal scale in panel (b). 
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Figure 5: Distribution of transition-event durations in toy models. The left panel is for potential 
Ui (smaller barrier), the right panel is for potential U2 (larger barrier). Notice in the right panel, a 
"short" transition path sampling simulation yields an incorrect distribution. 




Figure 6: Four types of transition paths between the states Cyeq and Cyax of alanine dipeptide at 
300K. Approximately 100 paths are randomly chosen based on their weights (resampled) from the 
full path ensemble. Different paths traverse different barriers, and therefore belong to different 
channels. 
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upper left 


upper right 


lower left 


lower right 


others 


BF 500K 


4.2% 


25.6% 


5.3% 


64.1% 


0.8% 


WE 500K 


6.0% 


21.4% 


4.9% 


67.4% 


0.4% 


WE 300K 


0.6% 


12.3% 


0.9% 


86.2% 


0.0% 



Table II: Distributions of the four principal channels in alanine dipeptide. The position of the final 
Cvax state in the extended plane of Fig. [6] was used to identify the channnel. The fraction of paths 
which did not end in one of these four closest final states are listed in the "others" column. 



14 



